[TEST][NO-MERGE] Stress test domain sockets #382
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It's found that if we use domain sockets to get around a port conflict issue, the communication is still not very reliable on macOS. With short messages, it gets stuck 6 out of 100k times. With 100KB messages, it gets stuck very frequently (2 out of 100 times).
The same test (with
nc -N
) can pass on Linux (Ubuntu 20.04.6 LTS, Xeon(R) Platinum 8375C).Netcat client missing EOF
It turns out that the netcat bundled with macOS is pretty old and buggy (there is no version number.
man nc
says 2001, while there is some speculation that it is from 2005). It gets stuck frequently being a domain socket client (the server is a reliablesocat
echo server), especially with large messages (100KB), which is evident withnmap ncat or
socat
are, on the other hand, reliable clients:However, neither is bundled on macOS.
Confusingly, this problem seems gone just by having a Scala server read timeout:
Netcat client incomplete message
Another (potentially unrelated) issue is that a netcat client pair can result in incomplete messages in bash scripts if the server doesn't send anything back (instead of e.g. echoing):
(There is no obvious way to implement an echo server with macOS netcat)
This doesn't reproduce directly in the Terminal or with nmap ncat /
socat
clients:It's unclear the root cause but might have to do with the fact the server doesn't send anything back, causing an incorrectly early termination.
Ncat / Socat
While nmap ncat and
socat
clients are reliable on their own, the stress test can still fail due to stuck timeouts. It's unclear if there is a problem with the implementation here or thejunixsocket
library.Confusingly, this problem is also gone just by having the server read timeout mentioned before: